Identifying Application Performance Limitations Associated with Microarchitecture Design
نویسندگان
چکیده
This paper presents and employs a methodology to analyze application performance with respect to microarchitecture design. As demonstrated by a case study of Sweep3D, this methodology systematically evaluates application performance improvements associated with enhancements to a baseline processor architecture and, as a result, identifies a microarchitecture, derived from the baseline architecture, that embodies acceptable cost/performance tradeoffs while reducing stalls in the microarchitecture. In the process, architectural design choices that limit the application’s performance are recognized and design changes that will enhance the application’s performance are suggested. Design changes identified by the case study increase Sweep3D’s performance by 1.6% to 24%. This methodology builds the framework for defining the application’s performance threshold, which quantifies performance potential that is unattainable due to application characteristics, rather than architectural design choices.
منابع مشابه
Identifying the best performing hardware platform based on inherent program similarity
An important problem in benchmarking is to identify the platform that yields the best performance for an application of interest. This paper proposes a methodology for doing this, using both microarchitecture-independent characteristics and genetic algorithms. We first compare the application of interest with the programs from a profiled benchmark suite. We subsequently make a performance predi...
متن کاملA Model for Performance Analysis of MPI Applications on Terascale Systems
Profiling-based performance visualization and analysis of program execution is widely used for tuning and improving the performance of parallel applications. There are several profiler-based tools for effective application performance analysis and visualization. However, a majority of these tools are not equally effective for performance tuning of applications consisting of 100’s to 10,000’s of...
متن کاملDesigning OP2 for GPU architectures
OP2 is an “active” library framework for the solution of unstructured mesh applications. It aims to decouple the specification of a scientific application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the current OP2 library for generating eff...
متن کاملSelf Modifying Circuitry - A Platform for Tractable Virtual Circuitry
The readily available performance advantages, gained in early virtual circuitry systems, are being recouped following advances in general purpose processor architectures and have resulted in a questioning of the tractability of applying virtual circuitry in a general software environment. This paper highlights two primary limitations of existing virtual circuitry systems: technical bandwidth li...
متن کاملStatistically Rigorous Regression Modeling for the Microprocessor Design Space
Regression models enhance existing techniques in detailed microarchitectural simulation by reducing the number of required simulations and using simulation data more efficiently to identify trends and trade-offs. We present a rigorous derivation of such models for microprocessor performance and power prediction, emphasizing the need to apply domain-specific knowledge when performing statistical...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001